Driving your 
own Car, 
anyone? 
Having a chauffeur was 
more than a luxury. It was a 
necessity. So many things 


could go wrong, requiring a 
technician's skills 


And it limited who could 
afford to own and use a car 


"The worldwide demand for cars will not 
Self-Service exceed one million — even if just for a 
scarcity of available chauffeurs." 


Revolution 


Gottlieb Daimler, Inventor, 1901 


zn "There is no reason for any individual to 
Sticking to the have a computer in his home." 


technology 


status q uo ? Ken Olson, Founder and CEO of Digital Equipment Corp, 
1977 at Convention of the World Future Society 


* Programmers write solutions (programs) in a 
programming language. 
* Requires intersection of programming skills 
(how?) and domain knowledge (what?) 


Programmer 


A * Programming languages themselves are the 
designing and developing subject of a design activity. 

Déc is * Facts and opinions abound: usability, 
expressiveness, correctness by construction, 
readability vs. writability, simplicity, style, ... 


Properties of 
Programming 
Languages 


Tools need to match the 
problem space, the 
audience expected to use 
the tool, and the 
expectation space of the 
desired outcome when 
using the tool. 


* Read-Only Languages 
* SQL (Structured Query Language) — many learn to 
read SQL, only a few can write non-trivial SOL 


* Write-Only Languages 
* Pearl — many learn to write scripts, but most cannot 
even read what they wrote themselves a day ago 


* Impedance Mismatch 
* "Ceremony" or lack of expressiveness force 
cumbersome formulations of solutions in a given 
problem domain 


* Requirements Mismatch 
* Functionally good expressions end up failing 
expectations of performance, security, etc. 


"| suppose it is tempting, if the only tool 
you have is a hammer, to treat everything 
as if it were a nail." 


Law ofthe 


Instrument 


Abraham Maslow, Psychologist, 1966 


Programming 
Languages 


Given a computer with 


some primitive ope 


and a problem to sc 


Formulate a composition of 
instructions to the 
computer that solve the 


problem. 


* Instructions can be very low-level 
(close to the machine's primitive 
operations) 


* Instructions can be very high-level 
(close to the problem domain at 
hand) 


* Most languages strike a balance 
* Too low-level (limited audience, 
limited target machines) 
* Too high-level 
(limited audience, limited 
problem domains) 


“General 


Purpose” 
Machine Domain 
Specific Specific 
— Skills —Interest —Audience 


Programmer 


A person skilled in 
designing and developing 
programs 


The chauffeur of your 
computer! 


* Why not "drive" your own computer to go 
where you want to go? 
* This is not about "using" a computer application, in 
the simple sense 


* Why not write the programs you need to get 
your job done, yourself? 


* This is not about "programming" a computer 
either, in the fullest sense 


* Why not master a programming language? 
* If the language is Abstract Algebra, you'll be in 
trouble; if it is Pidgin, you are in trouble too 


Self-Service 
Programming 


Tr f cars that st 


people can learn to drive 


ot to the limit of wh 
hink 18 


* Query by Example 
Moshé M. Zloof, IBM Research, mid-1970s 


* Generalizes to Programming by Example 
* Using direct manipulation, change results of a 
program, causing the system to adjust that 
program. 


* Users can watch the effect on the underlying 
program -and learn from that. 

* Some users pick up ways to change their programs 
directly, naturally learning the underlying 
programming language. 

* Requires uniform and simple languages. 


* Languages that strive to be 
"general purpose" end up being not 
quite right at most anything. 


Audience- 
Specific 
Programming 
Languages 


* To compensate, such languages 
develop a large arsenal of 
specialized but overlapping 


capabilities. 
Consider a variety of 


personas that characterize 
how groups of people get 
their tasks done. 


* The ideal maximized audience is 
subdued by complexity. 


* Larger audiences can be served 
with simpler languages to either 
side of the "general purpose" point. 


Consider a set of personas 
that fall into comparable 
needs/skills categories. Call 
that an audience. 


“Audience 
Specific” 


Machine Domain 

Specific Specific 
—Skills —Interest 
—Audience —=Complexity 


Anyone can 
drive a car 


Downside: everyone does 
drive a car. 


"The trouble with 
programmers is that you can 
never tell what a programmer 
is doing until it's too late.” 


Seymour Cray 


Anyone can 
write a program 


For a suitable set of 
domains and requirements. 


Example „a 
part of Microsoft Power BI, 
ims at Excel users that 
gather, combine, and 
analyze data from a wide 
variety of sources 


Orders - Query Editor 


Ja fases ves ota ype Munt = 


"M" - a simple 
programming 
language 


Again, an example - the 
Power Query Expression 
Language (often referred to 
as "M' for short). 


* Target audience is advanced Information 
Workers (Analysts etc.), Data Stewarts 
* Specifically, top 1096 (ish) of Excel users 
* Litmus test: benefits from today's Excel formulas 


* For that audience, the language should be 
* Simple, easy to remember 


* Easy to read and write; limited syntax, little use 
of non-standard symbols 


* Powerful; no cliffs for advanced user 


* Wide range of "data models" (relational, 
hierarchical, semi-structured, etc.) 


T-SQL 


Uniform simple 
syntax 


The of a language 
defines the form a valid 
expression in that language 
takes. 


It does not, as such, define the 
meaning of such an expression. 


T-SQL 


C& LINQ 
syntax 


Uniform simple 
yntax 


The of a language 
defines the form a valid 
expression in that language 
takes. 


It does not, as such, define the 
meaning of such an expression. 


T-sQL 


C#LINQ 
I i syntax 
Uniform simple y 
yntax 
The of a language C# LINQ 
defines the form a valid pattern 


expression in that language 
takes. 


It does not, as such, define the 
meaning of such an expression. 


Uniform simple 
syntax 


C# LINQ 
syntax 


C# LINQ 
pattern 


Semantics 

to meet 
expectations & 
requirements 


juage 
ning of an 


For a lan 
its semantic 


uniform principle: 


* Dynamic 
* "M" programs only fail when reaching an invalid evaluation state 
* Static checking, beyond syntax, is an option for tools 


* Functional (mostly) 


* Mostly deterministic: no direct side effects; mostly referentially 
transparent; once calculated, all values are immutable 


* External data is stream-processed (not necessarily buffered) and can 
be non-repeatable; error handling can expose non-determinism 


* Higher-order 
* Functions, closures, and types are also values 
* Nested application and conditionals as only forms of "control flow" 


* Optionally typed 


* Mostly optional yet expressive type system; very limited runtime 
checking of types 


No control-flow 
primitives ... 
Say again? 


in a programming 
language directs the flow of 


* "M" discourages explicit control flow (even recursion!) and prefers 
higher-order application 
* Many library functions take functions as arguments 


No control-flow 
primitives ... 
Say again? 


ina programming 
language directs the flow of 
program execution based on 
state observations 


Examples include constructs 
for looping (iteration), 
branching (case selection), and 
even jumping (“goto”). 


* “M” discourages explicit control flow (even recursion!) and prefers 
higher-order application 
* Many library functions take functions as arguments 


isthe name of 
a function. If applied to a table 
and a predicate, it returns a new 
table with rows that meet that 
predicate. 


This function is jit 
takes a function as its argument. 


No control-flow 
primitives ... 
Say again? 


ina programming 
language directs the flow of 
program execution based on 
state observations 


Examples include constructs 
for looping (iteration), 
branching (case selection), and 
even jumping (“goto”). 


* "M" discourages explicit control flow (even recursion!) and prefers 


higher-order application 


* Many library functions take functions as arguments 


isthe name of 
a function. If applied to a table 
and a predicate, it returns a new 
table with rows that meet that 
predicate. 


This function is jit 
takes a function as its argument. 


ij; second argument = 1;7;5:/57 
that takes a single row and determines 
whether that row should be selected 
(or dropped) 


In the example, the predicate function 
is ; it has no name and is 
defined right where it is needed 


* "M" discourages explicit control flow (even recursion!) and prefers 
higher-order application 
* Many library functions take functions as arguments 


Making the 
most common 


* Often, those parameter functions are unary 


case simple * Aspecial syntactic form helps construct unary function values 
A common pattern is that por 


higher-order functions take 
unary functions (single- * An ‘each’ expression is just shorthand for a unary function 


parameter functions) as * The single parameter of an ‘each’ function is named _ 


arguments. i . j 
* For conciseness, the _ can be omitted when accessing fields or 


Think items in a list, rows in a columns (this is the only case of syntactic finesse in M) 
table, fields in a record. 


Evaluation 
Model 


The evaluation model of a 
language determines 


yressions are evaluated. 


This can be seen as a 
refinement of the 


ge's semant 


* Expressions evaluate to values in a context 
* Thecontext binds names to values 


* Function application is strict 
+ No Excel-style if(condition, true-expression, false-expression) 
+ "M" has an if-expression (the only admission to control flow) 
if condition then true-expression else false-expression 
* Evaluation is eager except for value construction 
* Construction of structured values (records, lists, tables) is lazy 
* Can deal with infinite lists and tables 


* Can deal with partial records, lists, and tables 
(values containing embedded errors only show when accessed) 


* Evaluation ‘fails fast’ on hard errors 
* Simple model to raise and handle soft errors within M 


* The model of M combines a very simple (but not trivial) language 
with a uniform library of functions 
* Adding support for a new domain entails adding functions 


Going Places 


B e * Programming-by-example 
* A tool like Power Query "knows" how to generate M for some subset 
of functions in the library 


* SharePoint Online — Scheduled Refresh 


Opportun ities + Uploaded Excel workbook that uses Power Query connections 
^ * Refresh can draw on on-premise data sources through Hybrid Data 
related to Delivery (combined Hybrid Proxy in cloud and Information 


Management Gateway server on premise) 
* Gateway runs M mashup provider 


Power BI 


* Ongoing work 
* Today, SQL and Oracle only 
* Soon, most PO sources (but OAuth ones), through gateway 
+ Later, all sources and tier-splitting for cloud sources 


New * Larger declarative forms often require embedded mini languages 
Opportunity: for expressions 
* Mitself is a small language and the library can be tailored 
Mini Expression 
* Guaranteed safe semantics are often a concern to avoid the cost 
Language of executing in a container 


s * Missafe except for resource exhaustion (stack overflow, heap 
overflow) 


* Mengine can be parameterized to bound stack depth (not in 
product bits today) 


* Mlibrary can be subset to control memory pressure 


New * Data ingress pipelines in Cosmos, Hadoop, etc. 
fe) * Use M scripts to describe binary formats 
ppo rtun ity: * Use generic, M-driven, extractor/deserializer to crack wide range of 
Data Ingress binary formats using such M scripts 
* Work in progress 
* Mlibrary to describe and crack many kinds of binary formats 
shipped 


* Generic extractors and deserializers to come 


fileContents = #binary( 
{@x@@, exeo, exeo, oxe2, 
exeo, 0x03, 0x00, OxO4, 


Cracking Binary 
Formats 


exeo, exe5, exeo, exo6]), 


Sample file, viewed in hex 
I number of points (2) 

II point (x«3, y=4) 

I point (x=5, y=6) 


* BinaryFormat.Choice(binaryFormat as function, 
choice as function, optional combine as function, 
optional type as nullable type) as function 

i. The binary format that will be used to read the value 


Choice for the next binary format 


3. The function to combine the first value with the second value that 
was read 

4. Thetype of binary format that will be returned by the choice 
function; type any, type list, or type binary may be specified 


* Fortypes list or binary, may return a streaming instead of a buffered 
value, which may reduce memory necessary to read format 


BinaryFormat 
functions 


* The binary format value produced is processed in five stages: 
i. The specified binaryFormat is used to read a value 
2. Thevalue is passed to the choice function 


3. The choice function inspects the value and returns a second 
binary format 


4. The second binary format is used to read a second value 


5. The second value is returned 


let 


fileContents = #binary( Sample file, viewed in hex 
(0xee, exeo, exeo, oxe2, II number of points (2) 
exeo, 0x03, exeo, oxe4, II point (x*3, y=4) 
6x60, 0x05, @x@@, 0x06)), II point (x75, y=6) 


pointFormat - BinaryFormat.Record([ 


, X = BinaryFormat.SignedInteger16, 
Cracking Binary y = BinaryFormat.SignedInteger16 


Formats D, 
fileFormat = BinaryFormat.Choice( 
BinaryFormat.UnsignedInteger32, 
i (count) => BinaryFormat.List(pointFormat, count)) 
n 
fileFormat(fileContents) 
o Resulting M value, a list of records 
result values 4 
[x = 3, y = 4], 
[x = 5, y = 6] 
} 


* BinaryFormat.Choice(binaryFormat as function, 
choice as function, optional combine as function, 
optional type as nullable type) as function 

i. The binary format that will be used to read the value 


Choice for the next binary format 


3. The function to combine the first value with the second value that 
was read 

4. Thetype of binary format that will be returned by the choice 
function; type any, type list, or type binary may be specified 


* Fortypes list or binary, may return a streaming instead of a buffered 
value, which may reduce memory necessary to read format 


BinaryFormat 
functions 


* The binary format value produced is processed in five stages: 
1. The specified binaryFormat is used to read a value 
2. Thevalue is passed to the choice function 


3. The choice function inspects the value and returns a second 
binary format 


4. The second binary format is used to read a second value 


5. The second value is returned 


* Read atom of specified size and type 
* Byte, SignedInteger16, UnsignedInteger16, 
SignedInteger32, UnsignedInteger32, SignedInteger64, 
UnsignedInteger64, Single, Double, Decimal, 
7BitEncodedSignedInteger, 7BitEncodedUnsignedInteger, 
Null 


Cracking B Inary * Create function that reads format 


Formats * ByteOrder, Binary, Text, List, Record, Length, Choice, 
£ » Group, Transform 


Rich ii 


* The execution model (from interpreter of captured expression tree 
to generated native state machine) is an implementation detail 
* Can be tuned and improved over time 


* This small library has been successfully used to handle 
* "Big data" formats like Avro, Ray, ProtoBuf, or Bond 
* Metadata formats embedded in JPEG, PNG, MP3, etc. 


* Mscripts over massive scale-out systems like Cosmos run by 
folding — from M to Scope in the case of Cosmos 
* Within Scope, a generic M extractor can be used to crack binary 
streams 


* Can fold M to Scope 
* The generated Scope script contains references to the generic M 
extractor and the extractor's parameterization through given M 


* This is a form of "double folding" 
* The folding part of the initial M script is split into an outer Scope 
script and one or more inner M extractor scripts 


Folding 


M-M'«M" Scope trom Mi Extractor(M") 


* nested M" 


Still need a 
driver, anyone? 


Elevators and washing 
machines have an 
interesting thing in 
common: they no longer 
require a human operator. 


And, yes, Google invented the 
self-driving car. Not. 


* Web page 
* Excel or CSV/PSVJ... file 
* XML file, JSON file 


* Oracle database 
* IBM DB2 database 
* Sybase database 
* Teradata database 
* MySQL database 


* PostgreSQL database 
* SharePoint list 
* OData feed 


* Text file * Azure blob and table store 

* Folder * Hadoop Distributed File System 
Power Query * SQL Server database (DES) 
Data Sources + WindowsAzureSOLdatabase — ' Beebe e a 
peat cee) * Access database 


* Windows Azure Marketplace 


feeds and services 


* Active Directory 

* Facebook graphs 

* Exchange 

* SAP Business Objects 


* in public preview as of today 


* Power Query has shipped in two versions 
* Standalone (v1) shipped in July 2013 
* Corporate (v2) shipped in February 2014, 


Resources . pimi part of Power BI offering, a subscription service aligned with 
* Tutorials, samples, M language, and M library references 
* http://office.microsoft.com/en-us/excel-help/micr -power- 


